optimize computation in librrgraph utils #2714

heshpdx · 2024-09-05T04:40:41Z

Quick performance fix. In one hot loop, swap three expensive FDIVs for a single FDIV and three FMULs. Most CPUs execute FP multiplies magnitudes faster than FP divides.

No functional change seen, at least in my benchmarks.

In one hot loop, swap three expensive FDIVs for a single FDIV and three FMULs.

AlexandreSinger · 2024-09-17T18:09:20Z

@vaughnbetz I think this is ready to merge. It LGTM.

vaughnbetz · 2024-09-17T19:41:55Z

Thanks!

vaughnbetz · 2024-09-17T19:47:29Z

I wonder if the compiler would have factored this out ? No harm in any case to merge this ...

AlexandreSinger · 2024-09-17T20:45:08Z

@vaughnbetz I agree, I also think the compiler should handle this. The only thing that may not cause the compiler to do this optimization is precision. Division may yield a different result than multiplication by (1/x); therefore, if safe math compiler optimizations are turned on, the compiler cannot re-order the math operations. Since @heshpdx is working on the SPEC benchmark, it may be that safe math compiler optimizations are turned on to always give the same output regardless of optimization. That's my guess.

As far as I can tell, VTR does not turn on the fast-math optimization in GCC; so it should not reorder do anything in the floating point math which may yield different solutions. See: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-ffast-math

vaughnbetz · 2024-09-17T21:47:44Z

Good point; that makes sense.

heshpdx · 2024-09-17T22:32:52Z

Thank you for merging!

If num_switches were declared const, then the compiler might have a chance. But since it is not, the divide instructions are output every time. People worry too much about FP precision; in general this rarely leads to significant control flow changes. People also say "teach the compiler the cost model for this", and that is happening but mostly guarded under -mcpu=native type flags, and if you don't have the latest compiler you are out of luck. So I recommend fixing the software because it can produce an impact immediately and backwards for older compilers.

For SPEC CPU, we do not build VPR with -Ofast because VPR uses std::isnan so everything would break. There may be some subset of fast math options that are safe, but we haven't explored those yet. The code in question also produces divides with -O3 which is my default for testing. We offer a small amount of tolerance in output which would accommodate such changes, if they actually did produce a different answer (in this case they didn't).

This type of code is surprisingly common in the field, especially inside of for-loops. Take a look at my recent github history and you can see me tearing through applications offering this same perf optimization in droves. 😄

heshpdx · 2024-09-18T04:30:43Z

We offer a small amount of tolerance in output which would accommodate such changes, if they actually did produce a different answer (in this case they didn't).

Actually, I should correct this statement. A month ago I stopped offering tolerance! All outputs for the SPEC CPU version of VPR now much match exactly regardless of compiler or arch, and indeed they do match through testing across a variety of systems. And this patch did not move the needle.

vaughnbetz · 2024-09-18T12:24:33Z

That's good news! Is this with Amin's change to read in initial placement and use a power of 2 channel width in placement?
@amin1377

optimize computation in librrgraph utils

a25b7ec

In one hot loop, swap three expensive FDIVs for a single FDIV and three FMULs.

github-actions bot added the lang-cpp C/C++ code label Sep 5, 2024

Merge branch 'master' into master

734d54a

vaughnbetz merged commit f43e070 into verilog-to-routing:master Sep 17, 2024
53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize computation in librrgraph utils #2714

optimize computation in librrgraph utils #2714

Uh oh!

heshpdx commented Sep 5, 2024

Uh oh!

AlexandreSinger commented Sep 17, 2024

Uh oh!

vaughnbetz commented Sep 17, 2024

Uh oh!

Uh oh!

vaughnbetz commented Sep 17, 2024

Uh oh!

AlexandreSinger commented Sep 17, 2024 •

edited

Loading

Uh oh!

vaughnbetz commented Sep 17, 2024

Uh oh!

heshpdx commented Sep 17, 2024

Uh oh!

heshpdx commented Sep 18, 2024

Uh oh!

vaughnbetz commented Sep 18, 2024 •

edited

Loading

Uh oh!

Uh oh!

optimize computation in librrgraph utils #2714

optimize computation in librrgraph utils #2714

Uh oh!

Conversation

heshpdx commented Sep 5, 2024

Uh oh!

AlexandreSinger commented Sep 17, 2024

Uh oh!

vaughnbetz commented Sep 17, 2024

Uh oh!

Uh oh!

vaughnbetz commented Sep 17, 2024

Uh oh!

AlexandreSinger commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaughnbetz commented Sep 17, 2024

Uh oh!

heshpdx commented Sep 17, 2024

Uh oh!

heshpdx commented Sep 18, 2024

Uh oh!

vaughnbetz commented Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AlexandreSinger commented Sep 17, 2024 •

edited

Loading

vaughnbetz commented Sep 18, 2024 •

edited

Loading